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The point of these observations 
is not the reduction of the 
familiar to the unfamiliar [...] 
but the extension of the familiar 
to cover many more cases. 
Saunders MacLane 
Categories for the Working Mathematician |14| 

Page 226. 

Abstract 

Following F. William Lawvere, we show that many self-referential para- 
doxes, incompleteness theorems and fixed point theorems fall out of the 
same simple scheme. We demonstrate these similarities by showing how 
this simple scheme encompasses the semantic paradoxes, and how they 
arise as diagonal arguments and fixed point theorems in logic, computabil- 
ity theory, complexity theory and formal language theory. 



1 Introduction 

In 1969, F. William Lawvere wrote a paper in which he showed how to 
describe many of the classical paradoxes and incompleteness theorems in a cat- 
egorical fashion. He used the language of category theory (and of cartesian 
closed categories in particular) to describe the setting. In that paper he showed 
that in a cartesian closed category satisfying certain conditions, paradoxical 
phenomena can occur. Lawvere then went on to demonstrate this scheme by 
showing the following examples 

1. Cantor's theorem that N ^ p(N) 

2. Russell's paradox 

3. The non-definability of satisfiability 

4. Tarski's non-definability of truth and 

5. Godel's first incompleteness theorem. 
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Further work along these hnes were done in several papers e.g. |51 ll7l[Tmi2(J| . 

Unfortunately, Lawvere's paper has been overlooked by many people both inside 
and outside of the category theory community. Lawvere and Schanuel revisited 
these ideas in Session 29 of their book ^^l- Recently, Lawvere and Robert 
Rosebrugh came out with a book Sets for Mathematics jl2j which also has a 
few pages on this scheme. 

It is our goal to make these amazing results available to a larger audience. 
Towards this aim we restate Lawvere's theorems without using the language of 
category theory. Instead, we use sets and functions. The main theorems and 
their proofs are done at tutorial speed. We generalize one of the theorems and 
then we go on to show different instances of these result. In order to demonstrate 
the ubiquity of the theorems, we have tried to bring examples from many diverse 
areas of logic and theoretical computer science. 

Classically, Cantor proved that there is no onto (surjection) fmiction 

N — > 2^* = p(N) 

where 2^ is the set of functions from N to 2 = {0, 1}. 2^ is the set of charac- 
teristic functions on the set N and is equivalent to the powerset of N. We can 
generalize Cantor's theorem to show that for any set T there is no onto function 

T ^ 2^ - p(r). 

The same theorem is also true for other sets besides 2, e.g. 3 = {0,1,2} or 
23 = {0,1, 2,... 21, 22}. The theorem is not true for the set 1 = {0}. In 
general we can replace 2 with an arbitrary "non-degenerate" set Y. From this 
generalization, the basic statement of Cantor's theorem roughly says that if Y 
is "non-degenerate" then there is no onto function 

T — > r'^ 

where Y'^ is the set of functions from TtoY. Y can be thought of as the set of 
possible "truth-values" or "properties" of elements of T. By "non-degenerate" 
we mean that the objects of Y can be interchanged or that there exists a function 
a from y to y without any fixed points (y G Y where a{y) = y.) 

Rather than looking at functions / : T — > Y"^ , we shall look at equivalent 
functions of the form f : T x T — > Y. Every / can be converted to a function 
/ where f{t,t') = f{t'){t) £ Y. Saying that / is not onto is the same thing 
as saying that there exists a g(—) S Y^ such that for all t' £ T the function 
fit') = fi-,t') : T — yY is not the same as the function g{-) : T — >Y. In 
other words there exists a. t £ T such that 

git) ^ fit, t'). 

We shall call a function g : T — > Y "representable by to" if ) — /(— ,io)- 
So if / is not onto, then there exists a g{—) G Y^ that is not representable by 
any i £ T. 
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On a philosophical level, this generalized Cantor's theorem says that as long 
as the truth-values or properties of T are non-trivial, there is no way that a 
set T of things can "talk about" or "describe" their own truthfulness or their 
own properties. In other words, there must be a limitation in the way that T 
deals with its own properties. The Liar paradox is the three thousand year-old 
primary example that shows that natural languages should not talk about their 
own truthfulness. Russell's paradox shows that naive set theory is inherently 
flawed because sets can talk about their own properties (membership.) Godel's 
incompleteness results shows that arithmetic can not talk completely about 
its own provability. Turing's Halting problem shows that computers can not 
completely deal with the property of whether a computer will halt or go into 
an infinite loop. All these different examples are really saying the same thing: 
there will be trouble when things deal with their own properties. It is with this 
in mind that we try to make a single formalism that describes all these diverse 
- yet similar - ideas. 

The best part of this unified scheme is that it shows that there are really no 
paradoxes. There are limitations. Paradoxes are ways of showing that if you 
permit one to violate a limitation, then you will get an inconsistent systems. 
The Liar paradox shows that if you permit natural language to talk about its 
own truthfulness (as it - of course - does) then we will have inconsistencies in 
natural languages. Russell's paradox shows that if we permit one to talk about 
any set without limitations, we will get an inconsistency in set theory. This is 
exactly what is said by Tarski's theorem about truth in formal systems. Our 
scheme shows the inherent limitations of all these systems. The constructed g, 
in some sense is the limitation that your system (/) can not deal with. If the 
system does deal with the g, there will be an inconsistency (fixed point). 

The contrapositive of Cantor's theorem says that if there is a onto T — > Y-^ 
then Y must be "degenerate" i.e. every map from y to y must have a fixed 
point. In other words, if T can talk about or describe its own properties then 
Y must be faulty in some sense. This "degenerate" -ness is a way of producing 
fixed point theorems. 

For pedagogical reasons, we have elected not to use the powerful language 
of category theory. This might be an error. Without using category theory we 
might be skipping over an important step or even worse: wave our hands at a 
potential error. It is our hope that this paper will make you go out and look at 
Lawvere's original paper and his subsequent books. Only the language of cate- 
gory theory can give an exact formulation of the theory and truly encompass all 
the diverse areas that are discussed in this paper. Although we have chosen not 
to employ category theory here, its spirit is nevertheless pervasive throughout. 

This paper is intended to be extremely easy to read. We have tried to make 
use of the same proof pattern over and over again. Whenever possible we use the 
same notation. The examples are mostly disjoint. If the reader is unfamiliar 
with or can not follow one of them, he or she can move on to the next one 
without losing anything. Section 2 states Lawvere's main theorem and some of 
our generalizations. Section 3 has many worked out examples. We start the 
section with the classical paradoxes and then move on some of the semantic 
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paradoxes. From there we go on to other examples from theoretical computer 
science. Section 4 states the contrapositive of the main theorem and some of its 
generalizations. The examples of this contrapositives are in Section 5. We finish 
off the paper by looking at some future directions for this work to continue. We 
also list some other examples of limitations and fixed point theorems that might 
be expressible in our scheme. 

We close this introduction with a translation of Cantor's original proof of 
his diagonalization theorem. His language is remarkably reminiscent of our 
language. This translation was taken from Shaughan Lavine's book 

The proof seems remarkable not only because of its simplicity, 
but especially also because the principle that is employed in it can 
be extended to the general theorem, that the powers of well-defined 
sets have no maximum or, what is the same, that for any given set 
L another M can be placed beside it that is of greater power than 
L. 

For example Let i be a linear continuum, perhaps the domain 
of all real numerical quantities that are > and < 1. 

Let AI be understood as the domain of all single- valued functions 
f{x) that take on only the two values or 1, while x runs through 
all real values that are > and < 1. [ Af = 2^...] 

But M does not have the same power as L either. For otherwise 
M can be put into one-to-one correspondence to the variable z [of 
L], and thus M could be thought of in the form of a single valued 
function 

(p{x,z) 

of the two variables x and z, in such a way that through every 
specification of z one would obtain an element f{x) — (f>{x, z) of 
M and also conversely each element f{x) of M could be generated 
from z) through a single definite specification of z. This however 
leads to a contradiction. For if we understand by g{x) that single 
valued function of x which takes only values or 1 and which every 
value of X is different from (j)(x,x), then on the one hand g{x) is an 
element of M, and on the other it can not be generated from z) 
by any specification z = zq, because (l){zo^ zq) is different from g{zo). 

Acknowledgments. The author is grateful to Rohit Parikh for suggesting 
that this paper be written and for his warm encouragement. The author also 
had many helpful conversations with Eva Cogan, Scott Dexter, Mel Fitting, 
Alex Heller, Roman Kossak, Mirco Mannucci, and Paula Whitlock. 

2 Cantor's Theorems and its Generalizations 

It is pedagogically sound to skip this section for a moment and read the begin- 
ning of the next section where you can remind yourself of the proof of the more 
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familiar version of Cantor's theorem (about N ^ p(N)) and Russell's set theory 
paradox. Our theorem here might seem slightly abstract at first. 

Theorem 1 (Cantor's Theorem) If Y is a set and there exists a function 
a : Y — > Y without a fixed point (for all y Y, a{y) ^ y), then for all sets T 

and for all functions f : T x T — > Y there exists a function g : T > Y that 

is not representable by f i.e. such that for all t £ T 

Proof. Let F be a set and assume a : Y — > F is a function without fixed 
points. There is a function A : T — > T x T that sends every i G T to 
(t, t) € T xT. Then construct g : T — > Y as the following composition of three 
functions. 

T xT ^ 




In other words, 



g{t) = a{f{t,t)). 



We claim that for all t G T, g{—) ^ f{—,t) as functions of one variable. If 
g{—) = /(— ,to) then by evaluation at to we have 

f{to, to) = g{to) = a(/(io, ^o)) 

where the first equality is the fact that g is representable and the second equality 
is the definition of g. But this means that a does have a fixed point. □ 

Remark 1 Obviously, every set with two or more elements has a function to 
itself that does not have a fixed point. It is here that we get in trouble for talking 
about sets and functions as opposed to objects in a category and morphisms 
between those objects. Perhaps Y andT are sets with extra (algebraic) structure 
and functions between them are intended to preserve that extra structure. In that 
case, we are really dealing with fewer functions between the sets. 

Remark 2 The A map is called the "diagonal" and many of the proofs are 
called "diagonalization arguments. " / is some type of evaluation function and 
f{t,t) is an evaluation of itself, hence "self-reference" or "self- referential argu- 
ments. " 



Remark 3 We follow Lawvere and Schanuel ^13! in calling this theorem "Can- 
tor's Theorem" and it's contrapositive the "Diagonal Theorem" stated in Section 

4- 
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We generalize the above theorem so that instead of A = {Id, Id) we use 
{Id, (3) for an arbitrary onto (right invertible) function /? : T — > S. Whereas 
A = {Id, Id) -.T — >T xT takes every t to {t, t), {Id, 13) -.T — >T xS takes 
every t to [t, P{t)). 

The way to think about this theorem is to say that if there is a onto /3 : 
T — > S then in a sense \S\ ^ |r| and Cantor's theorem says \T\ < IF-^I and so 
we conclude that \S\^ l^^l- 

Theorem 2 Let Y he a set, a : Y — > Y a function without a fixed point, T 
and S sets and (3 : T — > S a function that is onto (i.e., has a right inverse 
P : S — »■ T,) then for all functions f -.T x S — > Y the function gp : T — »• Y 
constructed as follows 



TxS 



(id,D) 




is not representable by f. 

Proof. Let Y, a, T and P be given. Let (3 : S — > T be the right inverse of /?. 
By definition 

gp{t)=a{f{t,m))- 
We claim that for all s G 5 g0{—) ^ /(— , s). If g/3{—) = /(— , sq) then evaluation 
at P{so) gives 

f{P{so),So) = gi3{B{so)) by representability of 5^ 

= a(/(^(so),/3(/9(to)))) by definition of 5^ 

= a(/(/9(so), So)) by definition of right inverse. 



Which means that a does have a fixed point. □ 



We can think of this theorem in another way. Set S = T and lets consider a 
(3 different than Idx- The usual way to visualize Cantor's Theorem is 
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Everything that is in square brackets gets changed. For example j/3 gets changed 
to a(y3). However a Uttle thought shows that we do not need to go along the 
diagonal. The diagonal is just the simplest way. What is needed is that every 
row of the table gets at least one element changed. So we might have a picture 
that looks like this: 



/ 


tl 


t2 


^3 


tA 




tl 




yi 


2/21 


[2/2] 


2/4 


t2 


[yi] 


yn 


[2/2] 


2/7 


2/41 


h 


yo 


2/3 


2/7 


2/2 


[2/24] • • • 


tA 


2/9 


[2/7] 


2/64 


[2/2] 


2/4 


t5 


2/4 


2/73 


2/31 


[2/2] 


2/4 



The fact that every row has something changed is in essence the fact that (3 is 
onto. As long as (3 is onto, Cantor's theorem still holds. 

With this in mind we may pose - but do not answer - the following questions. 
Should these theorems really be called "diagonalization theorems" ? Does self- 
reference really play a role here? Since we can generate the same paradoxes 
without self-reference, does this destroy Russell's vicious-circle principle? 



3 Instances of Cantor's Theorems 

We shall begin with the familiar version of Cantor's theorem about the power set 
of the natural numbers. From there we move on to Russell's set theory paradox 
and other paradoxes and limitations. We shall do the first two instances slowly 
and use the same notation and ideas as the theorems in the last section. The 
other instances we shall do more quickly. 

Instance: Cantor's N ^ p(N) Theorem. The theorem says that there can 
not be an onto function from N to p(N). Let 5*0, 5*1, 5*2, .. . be a proposed 
enumeration of all subsets of N. Let 2 = {0, 1} be a set and consider the 
"negation" function a : 2 — > 2 where a(0) = 1 and a(l) = 0. Let / : 
N X N — > 2 be defined as 



, X ( 1 : if n € Sm 
/(n,m) = | Q , iin^Sm. 

For each m, f{—,m) is the characteristic function of 3^' 



f{-,m) = xsm- 
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Construct g as follows: 



N X N- 



A 



^2 



2. 



g is the characteristic function of the set 

G = {nGN\n^ S'„}. 

For all m, XG = 9{—) "t^) = Xs™- Because if there was an mo such that 

g{—) = f{—,mo) then by evaluation at mo we have 

/(mo, mo) = 5f(mo) = a{f{mo, mo)) 

where the first equality is from the fact that g is representable by mo and the 
second equality is by the definition of g. This means that the negation operator 
has a fixed point which is clearly false. In other words G C N is not in the 
proposed enumeration of all subsets of N. □ 

Instance: Russell's Paradox. This paradox says that the set of all sets that 
are not members of themselves is both a member of itself and not a member of 
itself. Let Sets be some universe of sets (we are being deliberately ambiguous 
here.) Again consider the "negation" function a : 2 — > 2 where a(0) = 1 and 
a(l) =0. Let / : Sets x Sets — > 2 be defined as follows on sets s and t. 



f{s,t) = 



if s e i 
ifs^t. 



We construct g as follows 



Sets X Sets ■ 



A 



Sets ^ 2. 

g is the characteristic function of those sets that are not a member of themselves. 

For all sets t, .g(— ) f{—,t). Because if there was a set to such that g{—) = 
f{—,to) then from evaluation at to wo get 



f{to,to)=9{to) = a{f{to,to)) 
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where the first equality is because g is representable and the second equahty 
is from the definition of g. This is plainly false. To summarize, in order to 
make sure that there are no paradoxes we must say that g is the characteristic 
function of a "collection" of Sets but this "collection" docs not form a set. 

We mention in passing that the Barber paradox and other simple self- 
referential paradoxes can be done exactly like this. The Barber paradox has 
a simple solution, namely that the village described by the phrase "there is a 
village where everyone who does not shave themselves is shaved by the barber" 
does not really exist. We are in a sense saying the same thing about Russell's 
paradox. Namely, the collection of sets that do not contain themselves docs 
not form an existent set. For some reason, people find it more ontologically 
disheartening to say that a collection does not form a set than that a particular 
village does not exist. □ 



Instance: Grelling's Paradox. We now move on to some of the semantic 
paradoxes. There are some adjectives that describe themselves and there are 
some that do not. "English" is an English word. "French" is not a French word. 
"Short" is not short and "Long" is not long. "Polysyllabic" is polysyllabic 
but "monosyllabic" is not monosyllabic. Call all words that do not describe 
themselves "heterological." Now ask yourself if "heterological" is heterological. 
It is if and only if it is not. 

Consider the set Adj of all (English) adjectives. We have the following 
function / : Adj x Adj — > 2 defined for all adjectives ai and 02, 

, J 1 : if 0,2 describes ai 
jya'1,0'2) Q . if 02 docs not decribe ai. 

And so we have the following construction of g 



Adj X Adj 



A 



Adj 



^2 



2. 



g is the characteristic function of a subset ( = property) of adjectives that can not 
be described by any adjectives. This is exactly what is meant by (?(— ) ^ ci) 
for all adjectives a. "Heterological" is not the only adjective that is in this 
subset. Some authors (e.g. Kleene) have also used the word "impredicable" . 
Our formulation includes all such paradoxical adjectives. □ 



Instance: Liar Paradox. The oldest example of a self-referential paradox is 
the (Cretans) liar paradox. Epimenides of Crete said "All Cretans are liars." 
There are many such examples: "This sentence is false." , "I am lying." The Liar 
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paradox is very similar to GrcUing's paradox. Whereas with GrcUing's paradox 
we dealt with adjectives, here we deal with complete English sentences. Quine's 
paradox is the primary example: 

'yields falsehood when appended to its own quotation' 
yields falsehood when appended to its own quotation. 

The philosophical literature is full of such examples. Since the formalism is 
similar to Grelling's paradox, we leave it to the reader. □ 



Instance: The Strong Liar Paradox. A common "solution" to the Liar's 

paradox is to say that that there are certain sentences that are neither true nor 
false but are meaningless. "I am lying" would be such a sentence. This is a type 
of three-valued logic. This is, however, not a "solution." Consider the sentence 

'yields falsehood or meaninglessness 
when appended to its own quotation' 
yields falsehood or meaninglessness 
when appended to its own quotation. 

If this sentence is true, then it is false or meaningless. If it is false, then it is true 
and not meaningless. If it is meaningless, then it is true and not meaningless. 

This paradox can also be formulated with our scheme. Consider the set of 
English sentences Sent and the set 3 = {T{rue), M{eaningless), F{alse)}. We 
have the following function / : Sent x Sent — > 3 defined for all sentences si 
and S2, 



/(S1,S2) 



if aa describes ai 

if it is meaningless for 02 to describe ai 
if a2 does not decribe ai. 



Now consider the function a : 3 
T. Construct g as follows 



3 defined as a{T) = F and a{M) = a{F) = 



Sent X Sent ■ 



A 



Sent ■ 



3. 



g is the characteristic function of sentences that are neither false nor meaningless 
when describing themselves. By characteristic function we mean those sentences 
that g takes to T as opposed to M or J". □ 



Instance: Richard's Paradox. There are many sentences in the English 
language that describe real numbers between and 1. Let us lexicographically 
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order all English sentences. Using this order, we can select all those English 
sentences that describe real numbers between and 1. For example "x is the 
ratio between the circumference and the diameter of a circle divided by ten." 
describes the number 0.314159 . . .. There are many similar English sentences. 
Call such a sentence a "Richard Sentence." So we have the concept of the "m-th 
Richard Sentence." 

Consider the set 10 = {0, 1, 2, . . .9} and the function a : 10 — > 10 defined 
as a{i) = 9 — i. This function does not have a fixed point. Now consider the 
function / : N x N — > 10 defined as 

f{n, ni) = The n-th decimal number of the m-th Richard Sentence. 

For example, if the sentence in the above paragraph is the 15th Richard sentence 
then /(4, 15) = 1 because of the 1 in 0.314159 . . .. Now consider g : N — > 10 
constructed as 

N X N ^ 




This g describes a real number between and 1 and yet for all to g N 

.g(-)^/(-,TO) 

i.e. this number is different than all Richard Sentences. Yet here is a Richard 
Sentence that describes this number: 

X is the real number between and 1 whose n-th digit is nine minus 
the n-th digit of the number described by the 7i-th Richard sentence. 

For reasons that are beyond the author, this paradox remains. □ 



Instance: Turing's Halting Problem. The following formulation was in- 
spired by Heller's fascinating work on recursion categories [S| and Manin's in- 
triguing paper on classical and quantum computations |15) . 

For this instance we leave the comfortable world of sets and functions. We 
must talk about computable universes. A computable universe is a category U 
with the following two properties 

1. N and 2 are objects in U 

2. For every object C in U there is some type of enumeration of the elements 
of C. An enumeration is a total isomorphism ec '■ N — > C. One should 
think of C as a set of computable things, e.g., trees, graphs, numbers, 
stacks, strings etc. 



12 



Yanofsky 



3. For every (not necessarily total) function / : C > C there is a corre- 
sponding number '~/~' € N. Think of this as the Godel number of the 
program that computes the computation. 

4. For every (not necessarily total) function / : C — > C there is a corre- 
sponding recursively enumerable (r.e.) set C N. For every c Q C, f 
has a value at c if and only if e^^(c) G Again one should think of a 
partial function from one computable domain to another. 

Halt in a computable universe should be a total function Halt : N x N — > 2 
in U such that for all / : C — > C 

Halt{-,^r) = Xw^fy 

This says that Halt should be able to tell for what values in C the computation 
halts. Formally 

rj . / 1 : if n G 

Haltin, m) = < „ -r ^ 

^ ' ' 1^ : if n ^ Wm- 

Consider a : 2 — > 2 defined as follows: a(0) = 1 and a(l) t) i-e., the 
computation is undefined. Construct g as follows: 



N X N- 



Halt 



^2 



A 



We conclude by showing that Halt is not total because it is not defined at 
'~g~'. If Halt was defined at '~g~' then we would have the following contradiction: 



Halt{^g^,^g^) 



iff '~g~' e by definition of Halt 

iff g{'~g~') = 1 by the halting of g 

iff Halt^g^, ^g^) = by the definition of g. 



Hence no total Halt can exist. □ 



Instance: A non-r.e. Language. There is a language that is not recognized 
by any Turing machine. Let Mq, Mi, M2, ... be an enumeration of all Turing 
machines on the input language E = {0, 1}. Let 'Wo,wi,iU2, ... be an enumera- 
tion of all the words in E*. If Wi is a word in S we let (wi) denote the numerical 
value of the binary word. Consider the following function / : S* x S* — > 2 
defined as follows: 

, \ / 1 • if w'i is accepted by 
J\ 3) . if u;. is not accepted by M(i„^). 
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Then the constructed g 



E* X E* 



is the characteristic function of a language that is not accepted by any Turing 
machine. Of course, the fact that there are non-r.e. languages also follows 
from a simple counting argument. Namely the number of Turing machines is 
countable and the number of languages (p(E*)) is uncountable. □ 



Instance: An Oracle B such that ^ NP^ . One of the major open 
questions in computer science is whether or not P, the set of all problems that 
can be solved by deterministic Turing machines (TMs) in polynomial time, is 
equal to the set NP, of all problems that can be solved by non-deterministic 
TMs in polynomial time. Alas, this question will not be answered in this paper. 
However there is a related question that can be answered. Consider the same 
question for oracle TMs. An oracle TM is a TM with an associated set S, such 
that the TM can determine if a word is actually an element of S. For a given 
set S there are analogous sets P^ and NP^ . Baker, Gil and Solovay have 
proven that there exists a set A such that P'^ = NP^ and there exists a set 
B such that P^ ^ NP^ . Here we shall prove the second result. Since every 
deterministic machine is by definition also nondeterministic, we have for every 
B, P^ C NP^ . What remains is to show that there is a set B and a language 
Lb such that Lb e NP^ hut Lb ^ P^ i.e. NP^ ^ P^. Our proof was 
adopted from jZj. 

Let MJ, M]^, M|, . . . be some enumeration of all the oracle deterministic 
polynomial Turing machines in the alphabet E = {0, 1}. There is a correspond- 
ing sequence of polynomials po{x) , pi{x) , p2{x) , . . . expressing the worst execu- 
tion time for each machine. 

For any function / : E* x N — > 2 and for each i e N, f{-,i) : E* — > 2 
is a characteristic function on the set E*. We will often confuse a set and its 
characteristic function. Let /(—,«) denote the characteristic function of the 
complement of /(— i.e., /(—,«) is the set that f{—,i) takes to 0. Let F{—,i) 
denote the cumulative characteristic function 

j<i 

We shall define /(-, -) inductively. {\/w G E*)/(w, 0) = 1. For w; e E* and 
i e N, f{w, i) = if and only if the following three conditions are satisfied 

1. (y/w' < w)f{'w', i) — 1 where the < is a lexicographical order on the words 
of E*. This insures that there is only one word accepted to B for each i. 
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2. Mf ^ rejects O'"'! within i steps. 



3. (Vj < i)M^'' '■'^ on input O''"! does not to query w within f"^ ^ steps. 
Once this / is defined, we construct g as follows 

S* X N ^ 




where /3(w) = a{0) = 1 and a(l) = 0. g{w) = 1 if and only if f{w, \w\) = 
if and only if the above three requirements arc satisfied. 

g is the characteristic function of the set B C T,*. Now construct the lan- 
guage 

Lb = {O'l-B contains a word of length i}. 

This language can easily be recognized by a linear time nondeterministic TM. 
On input 0*, the NTM simply has to guess a string w of length i and see if it is 
in B. Hence Lb & NP^ . In contrast, because of condition 2 above, Lb can not 
be recognized by any DTM in polynomial time, i.e., (ym)g{—) ^ f{—,m). □ 



4 Diagonal Theorem and Generalizations 

The contrapositive of Cantor's Theorem is of equal importance. 

Theorem 3 (Diagonal Theorem) If Y is a set and there exists a set T and a 

function f : TxT > Y such that all functions g : T > Y are representable by 

f (there exists at €T such that g{—) = f{—,t),) then all functions a : Y — > Y 
have a fixed point. 

Proof. The proof is constructive. Let Y, T, f and a be given. Then we construct 
g as follows: 

/ 



T xT- 



A 



^Y 



g is defined as 



g{m) = a{f{m,m)). 
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Since we have assumed that g is representable by some t , wc have that 

5("^) = f{'m,t). 

And so we have a fixed point of a at yo ~ g{t). Exphcitly we have 

'^i9(J')) — ct{f{t,t)) by representation of (7 
= g{t) by definition of g 

□ 

Remark 4 Obviously, any set Y with two or more elements has functions 
Y — > Y that do not have fixed points. It is here that we get in trouble by 
ignoring the category theory that is necessary. In the examples that we will do, 
the objects we will be dealing with have more structure then just sets and the 
functions between the objects are required to preserve that structure. We are 
only talking about these restricted functions. 

Remark 5 It is important to note that the theorem uses a stronger hypothesis 
than the proof actually uses. The theorem asks that all g : T — > Y be repre- 
sentable, however the proof only uses the fact that any g constructed in such a 
manner is representable. In the future, we shall use this fact and only require 
that constructed g be representable. 

5 Instances of Diagonal Theorems 

We use Mendelson's |T2] notation and language. In particular ^B{x)~^ is the 
Godel number of B{x). We shall assume that we are working in a theory where 
there is a recursive D : N — > N that is defined as follows: For all B{x) where B 
is a logical statement with x its only free variable then 

D{^B{xy) ^ ^B{^B{xyy. 



Theorem 4 (Diagonalization Lemma) For any well-formed formula (wf) 
£{x) with X as its only free variable, there exists a closed formula C such that 

hC< — >£{^C^). 

Proof. Let Lind^ be the set of Lindenbaum classes (algebra) of well-formed 
formulas with i free variables. Two wfs are equivalent iff they are provably 
logically equivalent. Let / : Lind^ x Lind^ — > Lind^ be defined for two wfs 
with a free variable B{x) and 7i(y) as follows: 



f{B{x),n{y))=H{^B{xr). 
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Let the operator on LindP $£• : LincP — > LincP be defined &s V ^ ^si'P) 
SCV). Using these functions, we combine them to create g as follows: 



Lind} X Lind} 



Lind° 



A 



Lind 



By definition 

g{B{x)) = ^£{f{B{x), B{x))) - 8{^B{^B{xrr). 

We claim that g is representable by Q{x) = £{D{x)). This is true because 

g{B{x)) = £{^B{^B{xry) = S{D{^B{xy)) = Q^Bixy) = f{B{x),g{y)). 

So there is a fixed point of at C = Qi^Q{x)~^). Explicitly we have 

£{^Q{^Q{xrV) = ^£{^Q{^Q{xyr) by definition of ^-^ 

= ^£{f{g{x),g{x))) by definition of / 

= g{Q{x)) by definition of g 

= f{G{x),g{x)) by representability of fif 

= GCGixy) by definition of /. 

□ 



Application: Godel's First Incompleteness Theorem. Let Prov{y,x) 
stand for "y is the Godel number of a proof of a statement whose Godel number 
is a;." Then let 

£{x) = {Vy)^Prov{y,x). 

A fixed point for this £{x) in a consistent and w-consistent theory is a sentence 
that is equivalent to its own statement of unprovability. □ 

Application: Godel- Rosser's Incompleteness Theorem. Let Neg : N — > 

N be defined for Godel numbers as follows 



Neg^Bixy) 



■^B{x) 



Let 



£{x) = {Vy){Prov{y,x) — > {3w){w < y) A Prov{w, Neg{x))). 



A fixed point for this £{x) in a consistent theory is a sentence that is equivalent 
to its own statement of unprovability. □ 
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Application: Tarski's Theorem. Let us assume that there exists a wen- 
formed formula T{x) that expresses the fact that x is the Godel number of a 
(true) theorem in the theory. Set 

£{x) = -^r{x). 

A fixed point of £{x) shows that T{x) does not do what it is supposed to do. 
We conclude that a theory in which the diagonalization lemma holds cannot 
express its own theoremhood. □ 

Application: Parikh Sentences. There are true sentences that have very 
long proofs, but there are relatively short proof of the fact that the sentences 
are provable. This amazing result about lengths of proofs can be found on page 
496 of R. Parikh's famous paper Existence and Feasibility in Arithmetic |18| . 
Consider a consistent theory that contains Peano Arithmetic. We shall deal 
with the following predicates: 

• Prflen{m, x) = m is the length (in symbols) of a proof of a statement 
whose Godel number is x. This is decidable because there are only a finite 
number of proofs of length m. 

• P{x) = 3yProv{y, x) i.e. there exists a proof of a statement whose Godel 
number is x. 

• £n{x) = -i(3m < n Prflen{m,x)). 

Applying the diagonalization lemma to £n{x) gives us a fixed point C„ such that 

h C„ ^ Eni^Cr^) = -(3to < n Prflen{m, ^C„^)). 

In other words C„ says 

"I do not have a proof of myself shorter than n." 
If Cn is false, then there is a proof shorter than n of C„ and the system is not 
consistent. 

Consider the following short proof of P{Cn) 

1. If Cn does not have any proof, then C„ is true. 

2. If Cn is true, we can check all proofs of length less than n and prove C„. 

3. From 1 and 2 we have that if C„ does not have a proof, then we can prove 
Cn. i.e. -P(C„) P{Cn). 

4. :.P{Cn). 

This proof can be formulated in Peano Arithmetic in a fairly short proof. In 
contrast n can be chosen to be fairly large. So we have a statement C„ which 
has a very long proof, but a short proof of the fact that it has a proof. □ 
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Application: Lob's Paradox. We prove that every logical sentence is true. 
The standard notation for the Godel number of a wff C is '~C~'. In contrast, if n is 
an integer then we shall write ltij for the wff that corresponds to that number. 
Obviously L^C^j = C 

Let A be any sentence. We shall prove that it is always true. Use the 
diagonalization lemma on 

£{x) = L.Tj A. 
A fixed point for this £{x) is a C such that 

h C < — > £{^C^) = (l^C^j ^A)^iC^ A). 

So C is equivalent to C ^ ^. Assume, for a second that C is true. Then C ^ A 
is also true. By modus ponens A is also true. So by assuming C we have proven 
A. This is exactly what C ^ A says and hence it is true as is its equivalent C 
and so A is true. 

This looks like a real paradox. It seems to me that the paradox arises because 
we did not put a restriction on the wffs £{x) for which we are permitted to use 
the diagonalization lemma. The Lob's paradox is related to Curry's paradox 
which shows that we must restrict the comprehension scheme in axiomatic set 
theory. □ 

Let us move from logic to computability theory. We shall use the language 
and notation of |4j. 



Theorem 5 (The Recursion Theorem) Let h : 

putable function. There exists an no G N such that 



N be a total com- 



Vh{no) = Vno- 

Proof. Let !F be the set of unary computable functions. Consider / : N x N — > 
T be defined as f(m,n) = 4'ci)„(m)- If 4'ni'm-) is undefined, then f{m,n) is also 
undefined. Letting the operator : J- — > be defined as — <Ph(n)- 

We have the following square: 



N X N 




By the S-M-N theorem there is a total 
Since s is total and 



g is defined as g[m) = (j)h{^^{m))- 
computable function s(m) such the 
computable, there exists a number t such that s(m) = 0t(™) and so g is repre- 
sentable because ^(m) = (j)h(4>^(ni)) = 0s(m) = </'0t(m) = f{'>n,t). So there is a 



s(m) ■ 
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fixed point of at no = <p4,t(t)- Explicitly we have 

4>h(<i>t(.t)) = ^h{<t>Mt)) by definition of 

= ^h{f{t,t)) by definition of / 

= g{t) by definition of g 

= f{t, t) by representability of g 

= (j)^^(t) by definition of /. 



Application: Rice's Theorem. Every nontrivial property of computable 
functions is not decidable. Let .4 be a nonempty proper subset of the set of 

all unary computable functions. Let A = {x\(j)x G A}. Then A is not recursive. 
We prove this by assuming (wrongly) that A is recursive. Let a € A and h ^ A. 
Define the function h as follows. 



h{x) = 



a : if X ^ A 
b : if a; e A. 



By definition a; G A iff h{x) ^ A. From our assumption, we have that h is 
computable (and total). Hence by the recursion theorem, there is an no such 
that (phino) = 4>no Now we have the following contradiction: 

no € A h{n()) ^ A by definition of h 

4'h(na) '0 A by the definition of A 
<^=^ 4>nQ ^ A by the recursion theorem 
<^=> no ^ A by definition of A. 

□ 



Application: Von Neumann's Self-reproducing Machines. A self-reproducing 

machine is a computable function that always outputs its own description. It 
might seem impossible to construct such a self-reproducing machine since in 
order to construct such a machine, we would need to know its description and 
hence know the machine in advance. However, by a simple application of the 
recursion theorem, we get such a machine. 

By a description of a machine, we could mean the number of the computable 
function i.e. a self-reproducing machine is a function (f)n{x) = n. for all input x. 

Let / : N X N — > N be the computable projection function f{y, x) = y. 
By the S-M-N theorem there exists a total computable function s such that 
(j)s(^y){x) = f{y, x) = y. From the recursion theorem, there exists an n such that 

(f)n{x) = (j)s{n){x) = f{n,x) = U. O 
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6 Future Directions 

There are many possible ways that we can go on with this work. We shaU hst 
a few. 

The general Cantor's theorem can be generalized further so that even more 
phenomena can be encompassed by this one theorem. For example what if we 
have two sets Y and Y' and there is a onto function from Y toY' . What does this 
say about the relationship between f -.TxT — >Y and f :TxT — >Y'7 We 
should get the concept of a paradox "reduction" from one paradox to another. 

Rather than simply talking about sets and functions, perhaps we should be 
talking about partial orders and order preserving maps. With this generaliza- 
tion, we might be able to not only get fixed point theorems but also least fixed 
point theorems. There are many simple least fixed point theorems such as ones 
for continuous maps of cpo's and Scott domains; Kripke's definition of truth [S] 
and the Knaster-Tarski theorem. 

Some more thought must go into Richards and Lob's paradoxes. Although 
we have stated their limitations, the paradoxes remain. Perhaps we are not for- 
mulating them correctly or perhaps there is something intrinsically problematic 
about these paradoxes. 

There are many fixed point theorems throughout logic and mathematics 
that are not of the type described in Sections 3 and 4. Can we in some sense 
characterize those fixed point theorems that are self-referential? 

It seems that the key component of the diagonalization lemma is the exis- 
tence of a recursive D : N — > N that is defined for all B{x) as 

Similarly, in order to have the recursion theorem we needed the S-M-N theorem. 
These two properties of systems are the key to the fact that the systems can 
talk about themselves. Are these two properties related to each other? More 
importantly, can we find other key properties in systems that make self- reference 
possible? 

In the introduction of this paper we talked of the lack of an onto function 
T — > Y^ and we said that Y may be thought of as truth-values or properties 
of objects in T. Can we find a better word for Y7 In Section 5 where we talked 

about an onto fimction Lind} — > LindP^^^'^ where Lind^ is the Lindenbaum 
classes of formula with i variables. In what sense is Lind^ the truth-values or 
properties of Lind^l We then went on to talk about an onto function N — > J-^'^ 
where T is the set of unary computable functions. We used this onto function 
to prove The Recursion Theorem. In what sense is T the truth-values or the 
properties of N? 

As for more instances of our theorems, the field is wide open. There are 
many paradoxical phenomena and fixed point theorems that we have not talked 
about. Some of them might not be amenable to our scheme and some might 
not be. 
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• There are many of the semantic paradoxes that we did not discuss. The 
Berry paradox asks one to consider the sentence "Let x be the first number 
that can not be described by any sentence with less than 200 characters." 
We just described such a number. 

• The Crocodile's Dilemma is an ancient paradox that is a deviously cute 
self-referential paradox. A crocodile steals a child and the mother of the 
child begs for the return of her beloved baby. The crocodile responds "I 
will return the child if and only if you correctly guess whether or not I 
will return your child." The mother cleverly responds that he will keep 
the child. What is an honest crocodile to do?!? 

• There is a belief that all paradoxes would melt away if there were no 
self-referential statements. Yablo's Non-self-referential Liar's Paradox was 
formulated counteract that thesis. There is a sequence of statements such 
that none of them ever refer to themselves and yet they are all both true 
and false. Consider the sequence 

(Si) : For all k > i, Sk is untrue. 

Suppose Sn is true for some n. Then Sn+i is false as are all subsequent 
statements. Since all subsequent statements are false, Sn+i is true which 
is a contradiction. So in contrast, Sn is false for all n. That means that 
is true and S2 is true etc etc. Again we have a contradiction. 

• Brandenburger's Epistemic Paradox 3 considers the situation where 

Ann believes that Bob believes that Ann believes that Bob has a 
false belief about Ann. 

Now ask yourself the following question: Docs Ann believe that Bob has 
a false belief about Ann? With much thought, you can see that this is a 
paradoxical situation. 

• The Ackermann function is not a primitive recursive function. One hears 
the phrase that Ackermann's function "diagonalizes-out" of primitive re- 
cursive functions. 

• There is a famous Paris-Harrington result which says that certain general- 
ized Ramsey theorems can not be proven in Peano Arithmatic. Kanamori 
and McAloon make the connection to the Ackermann function. Just as 
the Ackermann function "diagonalized-out" of primitive recursiveness, so 
too, generalized Ramsey theory is "diagonalized-out" of Peano Arithmetic. 
Both of these are really stating limitations of the systems. 

There are many instances of fixed point theorems that might be put into the 
form of our scheme. 



• Borodin's Gap Theorem is a type of fixed point theorem in complexity 
theory that might be right for our scheme. 
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• We again mention the Knaster-Tarski theorem about monotonia functions 
between preorders. There is also a much used theorem about fixed points 
of continuous functions between cpo's. 

• As the ultimate in self-reference, we would like to mention Kripke's theory 
of truth that he used to banish self-referential paradoxes. It is, in essence, 
a type of fixed point theorem. It would really be nice to formulate that 
way of dealing with paradoxes in our language. 

• Brouwer's fixed point theorem, or the far simpler intermediate value the- 
orem. 

• Nash's equilibria theorem and its many generalizations from game theory. 

There are several theorems from "real" mathematics that are proved via 
diagonalization proofs. We might be able to put them into our language. 

• Baire's category theory about metric spaces. 

• Montel's theorem from complex function theory. 

• Ascoli theorem from topology. 

• Helly's theorem about limits of distributions. 
The following ideas are a little more "spacey." 

• Godel's second incompleteness theorem about the unprovability within 
arithmetic of the consistency of arithmetic. This theorem is a simple 
consequence of the first incompleteness theorem. However Kreisal has 
a direct model theoretic proofs that uses a diagonal method (see, e.g., 
page 860 of Smoryhski's article in ^.) This proof seems amenable to our 
scheme. 

• Many of Chaitin's algorithmic information theory arguments seem to fit 
our scheme. 

• We worked out Godel's first incompleteness theorem which showed that 
(using the language of the introduction) arithmetic can not completely talk 
about its own provability. What about Godel's completeness theorem? 
Certain weak systems can completely talk about their own provability. 
Can this be stated as some type of fixed point theorem? 
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